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ABSTRACT 


A  method  based  on  the  minimization  of  cross-entropy  is  presented 
for  the  recovery  of  signals  from  noisy  data  either  in  the  form  of  time 
series  or  images.  Finite  Fourier  transforms  are  applied  to  the  data  and 
constraints  are  placed  on  the  magnitude  and  phase  of  the  Fourier  coef¬ 
ficients  based  on  their  statistics  for  noise-only  data.  The  minimization  of 
cross-entropy  is  achieved  through  application  of  well-established  functional 
minimization  techniques  which  allow  for  further  constraints  in  the  spatial, 
temporal  or  freoue'^cy  domain.  Derivatives  of  the  entropy  function  are 
obtained  analytically  and  the  results  applied  to  the  cases  of  correlated 
noise  and  of  signal  perturbations  about  a  mean.  Demonstrations  of  applica¬ 
tions  to  one-dimensional  data  are  presented. 


OOPV 

l"»SP£:CTtD 


Accesiot)  Fof  1 

NTIS  CRA&I  ^ 

OTIC  TAB  □ 

U.iaiiPj  !  ,r.ed  □ 


By  . . 

Avi  'ibJIty  C'oOo'’. 


Ci'.t  ! 


I  i.i'l/oi 


& 


I.  INTRODUCTION 


A  common  problem  of  data  analysis  is  the  reconstruction  of  a  signal 
from  noisy  data.  The  objective  is  to  obtain  as  true  a  representation  as  the 
noisy  data  will  allow  without  having  exact  knowledge  of  the  event  giving 
rise  to  the  signal  or  of  its  consequences. 

Usually  some  knowledge  of  the  signal  is  available.  For  example  if 
the  data  consist  of  a  radar  image  of  the  ocean  surface  and  the  signal  sought 
is  an  internal  wave  pattern,  then  a  great  deal  is  known  about  the  physical 
properties  of  such  waves.  In  some  cases  independent  sea-surface  measure¬ 
ments  are  available  which  allow  estimation  of  wavelengths,  amplitudes  and 
velocities.  The  reconstruction  of  the  wave  pattern  will  be  deemed  unsatis¬ 
factory  unless  it  conforms  to  the  background  knowledge.  What  is  desired  ir 
such  cases  is  a  method  of  incorporating  prior  knowledge  in  the  signal 
recovery  process  while  maintaining  a  degree  of  flexibility  consistent  with 
the  state  of  that  knowledge  and  the  reliability  of  the  data. 

In  this  paper  a  method  of  signal  reconstruction  is  described  which 
evokes  the  principle  of  minimum  cross-entropy,  a  generalization  of  the  prin¬ 
ciple  of  maximum  entropy,  and  which  incorporates  prior  knowledge  in  the  form 
of  constraints  on  the  solutions  to  minimization  problems.  The  method  uses 
the  finite  Fourier  transform  and  constraints  may  be  applied  in  either  the 
spatial  or  frequency  domain.  The  solutions  described  herein  were  obtained 
using  a  general  purpose  minimization  routine  and  were  restricted  to  one- 
dimensicnal  data.  For  two-dimensional  data,  a  faster  special-purpose 
routine  has  been  developed  based  on  the  same  principles  [l] .  In  discussing 
the  method,  reference  will  be  made  to  image  processing  since  it  was  this 
application  for  which  it  was  envisioned.  Analytical  results  given  for  the 
one-dimensional  case  can  oe  extended  easily  to  v,wo-dimensions. 
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II.  ENTROPY  AND  THE  WIENER  FILTER 


A  digital  time  series,  or  image,  which  is  the  result  of  a 
stochastic  process  can  be  considered  as  a  set  of  values,  d(x),  of  limited 
extent  and  accuracy  which  are  assigned  to  a  discrete  set  of  locations 
xi,i=0,l, . . .M-1.  The  values  are  required  to  be  non-negative  and  are  taken 
to  be  representative  of  intensities.  The  individual  intensities  can  be  con¬ 
sidered  to  be  the  proportion  of  the  total  available  intensity  which  is 
assigned  to  a  particular  location.  The  problem  at  hand  is  to  arrive  at  an 
estimation  of  the  signal  values,  s(x),  when  the  data  could  be  fitted  equally 
well  by  many  estimation  sets. 


Given  that  this  problem  is  ill-posed  in  the  Hadamard  sense,  various 
schemes  are  available  for  defining  an  associated  problem  which  is  well-posed 
[2].  However,  no  method  can  be  said  to  be  optimum  in  the  wide  sense,  so  the 
selection  of  a  particular  method  depends  on  the  nature  of  the  knowledge 
available  as  well  as  on  the  experience  of  the  analyst.  When  noise  is  known 
to  be  a  major  component  of  the  data,  a  desirable  feature  is  the  availability 
of  a  related  measure  of  statistical  significance  which  ideally  manifests 
itself  as  a  parameter  defining  a  family  of  solutions. 


Entropy  optimization  is  a  powerful,  general  technique  which  provi¬ 
des  solutions  which  possess  many  desirable  properties  [3]  [a] .  Maximum 
entropy  implies  “maximally  smooth”  and  sometimes  “maximally  likely"  as  well. 
Such  solutions  are  said  to  be  the  simplest  possible  result  containing  the 
bare  minimum  of  structure  needed  to  fit  the  constraints  imposed  by  the  data 
[5] .  Furthermore  it  has  been  shown  that  optimum  generalized  entropy  solu¬ 
tions  possess  uniquely  certain  properties  of  consistency  in  cases  where  the 
given  data  are  supplemented  subsequently  [5] .  Although  the  methods  of  maxi¬ 
mum  entropy  and  its  generalization,  minimum  cross-entropy,  may  be  subject  to 
various  interpretations,  theT-p  is  merit  in  the  approach  of  Frieden  [7]  based 
on  information  theory  since  this  provides  a  useful  context  for  analyzing 
attempts  at  signal  reconstruction.  A  brief  summary  follows. 
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The  classical  (Shannon)  definition  of  the  information,  I(A,B),  con¬ 
tained  in  a  message  A  about  an  event  B  is  given  by 

I(A,B)  =  ln[p(B|A)/1=(B)]  (1) 

where  P  represents  the  probability  density  function. 

The  event  B  may  be  taken  to  be  a  process  which  produces  a  con¬ 
tinuous  variable  intensity,  s,  and  the  message  A  may  be  associated  with  a 
set  of  measurements,  d,  of  the  consequences  of  the  event.  The  entropy,  H, 
of  the  event  is  defined  as 

00 

H(B)  =  -  I  PCy)  lnP(y)dy.  (2) 

.00 

The  performance  of  a  measuring  device  is  difficult  to  characterize 
in  general.  In  practice  often  it  may  be  considered  to  be  band-limited, 
linear  in  response,  and  noisy.  A  common  model  describing  its  performance  is 

d(x)  =  s(y)  X  g(y)  +  n(x)  ,  (3a) 

where  d  represents  the  data 

s  represents  the  signal 

n  represents  the  noise,  independent  of  s, 

g  represents  the  action  of  the  instrument 

and  X  represents  the  process  of  convolution  through 

which  this  action  is  accomplished. 

A  Fourier  transform  of  this  equation  yields 

D(u))  =  S(u))  G(a))  +  N(a))  -  Y(w)  +  N(u)  (3b) 

where  u  represents  frequency,  either  in  a  spatial  or  temporal  sense,  and  the 
capital  letters  represent  the  Fourier  transforms  of  the  corresponding  lower¬ 
case  variables. 
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If  restoration  is  sought  through  the  multiplication  of  D(a))  by  a 
weighting  function,  W(a)),  and  a  subsequent  inversion  of  the  resulting  trans¬ 
form  data,  it  is  well-known  that  the  minimum  expected  mean-square  error  in 
the  restored  signal  is  achieved  by  the  so-called  Wiener  filter  given  by 

WCco)  =  G”^((d)  CrCoj)  /  {1  +  r(wj}]  (43 

where  rCw)  =  |Y(u)3 1 VI N(a)3 1 which  represents  the  data  signal-to-noise 
power  ratio  at  frequency  o). 

Frieden  [?]  demonstrates  that  this  filter  may  be  derived  from 
information-theoretic  principles,  in  particular,  through  maximization  of  the 
transinformation,  I(Y,03,  defined  as  the  difference  between  the  entropies 
associated  with  Y(u)  and  NCm).  These  can  be  shown  to  be  functions  only  of 
the  power  spectral  coefficients  given  by 

HCY3  =  1  +  ln(ir|Dl^)  (5) 

and  H(N)  =  1  +  ln(TT|Nl*3 

under  the  assumption  that  the  noise  is  Gaussian  of  zero  mean  and  independent 
of  the  signal. 

The  maximum  possible  ICY,D)  is  termed  the  channel  capacity,  C(u)), 
and  is  a  measure  of  the  potential  for  restoration  of  the  signal  component  at 
frequency  u  under  the  given  noise  conditions.  It  can  be  shown  that 

C(u)  =  ln(l+r((»)))  (6) 

so  that  W(tD)  =  [l  -  exp{  -  C((D3}]/G(a)3. 

In  the  present  paper  the  main  concern  is  noise  reduction,  so  G(a)) 
may  be  taken  to  be  unity.  In  this  situation  the  Wiener  filter  approach  is 
to  apply  weights  to  the  Fourier  coefficients,  thereby  reducing  those  coef¬ 
ficients  where  the  signal-to-noise  ratio  is  expected  to  be  low  (low  channel 
capacity)  while  leaving  relatively  unaffected  those  coefficients  whose 


signal-to-noise  ratio  is  expected  to  be  high.  Phases  are  preserved  at  all 
frequencies  whereas  some  loss  of  signal  power  can  be  expected  on  average, 
the  cost  of  the  improvement  in  overall  signal-to-noise  ratio. 

III.  RESTORATION  IN  THE  FOURIER  DOMAIN 

When  a  signal  has  undergone  a  convolution  process  it  is  convenient 
to  treat  the  data  after  Fourier  transformation.  Even  if,  as  herein,  the 
major  concern  is  not  deconvolution,  there  are  good  reasons  for  frequency 
domain  processing.  When  noise-only  samples  undergo  linear  transformation, 
the  resultant  coefficients  are  often  nearly  independent  Gaussian  distributed 
as  a  consequence  of  the  central  limit  theorem  [e] .  Therefore,  amplitude  and 
phase  statistics  are  well  approximated  by  well-known  distributions 
regardless  of  the  noise  distribution  in  the  spatial  domain.  In  other  words, 
frequency  domain  processing  tends  to  be  robust. 

Wiener  filtering  is  widely  used  for  restoration  [9],  even  when 
knowledge  of  the  signal  spectrum  is  inexact.  The  analyst  must  accept  less 
than  optimal  results,  which  may  or  may  not  be  satisfactory.  Often  crude 
approximations  to  the  signal  spectrum  are  sufficient.  For  unknown  signals 
some  property  may  be  chosen  as  characteristic  of  a  desirable  solution  to 
serve  as  a  basis  for  a  criterion  of  optimality.  Candidate  solutions  are 
constrained  to  conform  within  limits  based  on  the  noise  statistics.  Methods 
of  least  squares  constraints  seek  solutions  for  which  the  estimated  error 
variance  matches  that  of  the  noise. 

Hunt  [10]  developed  a  method  for  which  the  criterion  of  optimality 
was  the  minimization  of  the  square  of  the  Laplacian  -  a  smoothness  criterion 
implying  that  the  signal  was  basically  low  frequency  in  content.  Gull  and 
Daniel  [ll]  employed  another  criterion  for  smoothness  -  maximum  entropy  - 
combined  with  a  measure  of  expected  error.  Their  method  seeks  solutions 
which  minimize  Q(X]  defined  by 
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Q(X)  =  -  ^  ^  1  ~  D(u)  I  ^/o^Cw)  (7) 

j 

where  fj  represents  intensity  of  the  j-th  location 
X  represents  a  positive  Lagrangian  multiplier 
Y  represents  the  estimated  transform 
D  represents  the  data  transform 
a  represents  the  noise  standard  deviation 

and  the  sum  over  w  is  taken  over  a  set  of  preselected  frequencies. 

The  solution  is  obtained  by  an  iterative  procedure,  the  correct 
value  of  X  being  that  for  which  the  sum  over  oj  achieves  its  expected  value. 
Candidate  solutions  can  be  assigned  confidence  levels  based  on  the  noise 
statistics. 

Direct  subtraction  of  noise  power  in  the  frequency  domain  is  a 
simple  method  of  noise  reduction,  perhaps  best  described  by  Lim  [12] .  The 
estimated  signal  Fourier  coefficient,  yCw),  is  defined  by 

|Y(u)  1^  =  |dC(d)1^  -  a  a^Co)),  if  positive  (8) 

=  0  ,  otherwise. 

The  phase  of  Y(a))  is  taken  to  be  the  phase  of  D((jo).  If  DCu)  equals 
S(co)+N(co},  then  |  D(oj)  |  is  well  approximated  by  a  noncentral  chi- 

squared  distribution  of  mean  lSl^/o^+2  and  variance  4|5|^/a^+4  [13] . 

Normalization  of  Eq.  (8)  yields 

lY(a))|VaHu)3  =  |D(a))lVa^(u))  -  a  (9) 


where  the  left-hand  side  has  a  mean-shifted  noncentral  chi-squared  distribu¬ 
tion.  If  a  >  2,  the  expected  value  is  less  than  [Sl^/a^,  so  some  loss  of 
signal  power  results  on  average.  On  the  other  hand,  the  larger  a,  the  more 
likely  that  the  residual  power  is  due  to  the  presence  of  a  signal  at  the 
given  frequency. 
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The  loss  of  signal  power  under  these  circumstances  may  not  be  cri¬ 
tical  since  it  is  the  phase  spectrum  which  determines  the  general  features 
of  an  image  [ia]  .  Acceptance  of  the  phase  spectrum  of  the  data  implies 
acceptance  of  these  general  features  which,  in  turn,  suggests  a  high  signal- 
to-noise  ratio.  The  accuracy  of  the  phase  estimate  is  a  function  of  the 
signal -to-noise  power  ratio  and  therefore  of  the  channel  capacity  at  a  given 
frequency.  It  is  shown  below  that  cross-entropy  is  a  function  of  the  phase 
when  prior  spatial  Information  on  the  signal  is  introduced. 

For  the  method  proposed  in  the  next  section,  constraints  are 
applied  in  both  the  spatial  and  frequency  domains  while  Fourier  magnitudes 
and  phases  are  allowed  to  vary  independently.  Obviously  constraints  may  be 
;':ho5en  so  that  no  solution  exists.  A  trivial  example  would  be  the  case 
in  which  only  a  single  frequency  component  is  allowed  while  the  signal  is 
required  to  be  zero  over  a  certain  interval.  Furthermore,  it  is  well-known 
that  if  a  signal  is  known  to  satisfy  certain  constraints,  that  signal  may  be 
specified  uniquely  by  partial  Fourier  domain  information  such  as  the  trans¬ 
form  magnitude  [is]  or  phase  or  even  the  sign  of  its  real  part  [l6] .  These 
issues  of  existence  and  uniqueness  are  a  subject  of  continuing  research  [2] . 

IV.  A  CONSTRAINED  MINIMUM  CROSS-ENTROPY  hCTHOD 

In  this  section  the  proposed  method  is  described.  Assume  the  data 
are  a  set  of  intensities  denoted  by  dj, j=0, 1,2, . . . .M-1,  which  are  the  sums 
of  a  signal  component,  Sj  and  an  independent  noise  component,  nj.  It  is 
desired  to  find  a  representation  of  the  signal  which  is  consistent  with  the 
data,  given  estimates  of  the  means  and  variances  of  the  noise  power  spectral 
coefficients  and  the  phase  spectrum  at  frequencies  wj,j=0,l, ..(M/2)+l.  (The 
phase  data  are  expressed  best  in  terms  of  the  arctan  of  the  estimated  phase 
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which  tends  to  be  normally  distributed  [b]).  The  proposed  method  of  solu¬ 
tion  is  as  follows.  Obtain  the  finite  Fourier  transform  of  the  data,  D(a)jO. 
Let  S(a)j3  represent  the  estimate  of  the  Fourier  coefficient  of  the  signaJ  at 
(i)j.  Select  as  a  first  estimate  of  |S(tjj)P  the  value  of  |D(a)j)|^  - 
iNCwj)!^,  if  positive,  otherwise  assume  it  to  be  zero.  The  first  estimate 
of  the  phase  is  the  phase  of  D((jjj).  Place  bounds  on  the  subsequent  estima¬ 
tes  of  magnitude  and  phase  using  the  prior  knowledge  of  the  noise  spectral 
properties.  Now  seek  a  solution  which  minimizes  the  generalized  entropy. 
This  process  is  either  the  maximizatirn  of  entropy  or  the  minimization  of 
cross-entropy,  as  appropriate.  It  is  assumed  that  the  sum  ^sj  is  equal  to 
a  given  constant  and  that  the  signal  components  are  all  positive. 

To  achieve  a  solution  a  general  nnmnt^o  quasi-Ntwton  minimization 
routine  was  chosen  from  the  NAG  library  [l?] .  If  tne  ,jaunds  on  the  fre¬ 
quency  domain  variables  are  “tight",  this  implies  that  the  signal-to-noise 
ratio  is  high  and  the  solution  is  required  to  conform  closely  to  the  data. 
If  the  bounds  are  “loose"  the  solution  may  depart  significantly  from  the 
data  values,  now  assuined  to  be  noisy,  and  will  tend  to  conform  to  the  prior 
knowledge. 

Additional  constraints  may  be  applied  on  a  point-by-point  basis  in 
either  domain,  so  prior  knowledge  can  be  incorporated  into  the  solution  with 
ease.  If,  for  example,  it  is  known  that  the  signal  has  a  certain  minimum 
total  power,  the  solu*’.ion  may  be  constrained  so  that  this  requirement  is 
fulfilled.  If  the  signal  is  known  to  decrease  in  amplitude  as  j  increases, 
this  may  be  translated  into  a  decreasing  upper  bound  in  the  original  data 
domain. 

Any  initial  estimate  of  SCojj)  is  acceptable  provided  only  that  it 
lies  within  the  prescribed  range.  Generally,  Wiener  filter  solutions  pro¬ 
vide  initial  estimates  closer  to  the  final  solution  than  do  those  obtained 
from  the  noise  subtraction  process  described  above.  This  is  to  be  expected 
since  the  Wiener  filter  is  based  upon  an  exact  knowledge  of  the  signal 
spectrum. 
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It  should  be  noted  that  constraints  in  the  frequency  domain  can  be 
applied  to  the  real  and  imaginary  parts  of  the  transform  coefficients  rather 
than  to  their  phase  and  magnitude.  Results  presented  elsewhere  [l]  were 
obtained  in  this  manner.  The  domain  of  acceptability  is  not  the  same  for 
the  two  methods. 

Examples  of  the  application  of  the  algorithm  to  one-dimensional 
simulated  data  are  given  in  Section  VIII.  The  next  three  sections  present 
analytical  results  which  provide  some  insight  into  the  process  and  the  solu¬ 
tions  expected. 

V.  ENTROPY  DERIVATIVES 

The  quasi-Newton  minimization  technique  requires  knowledge  of  the 
derivatives  of  the  objective  function  with  respect  to  the  constraint 
variables  [l?] .  The  derivatives  of  the  entropy  function  with  respect  to 
Fourier  magnitude  and  phase  can  be  obtained  in  a  straightforward  manner  from 
the  properties  of  the  finite  Fourier  transform.  For  a  function,  f,  defined 
at  a  discrete  set  of  equispaced  values  fj, j=0, 1,2, . .M-1,  its  transform, 
Fm(f3,  may  be  defined  as 

M-1 

Fj^  (f)  =  ^  fjexpL-  iC2njm/M)J  [103 

J=0 

-  ”  tv 

where  am  and  ())m  are  thus  defined  in  terms  of  the  magnitude  and  phase  of  the 
Fourier  coefficients.  Let  the  generalized  entropy  function,  H,  be  defined 
by 

M-1 

H  =  X  f.ln(f./b.3  (113 

^  J  J  J 
j=0 
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vyhere  the  bj’s  represent  the  assumed  values  of  the  solution.  For  uniform 
bj,  H  is  the  negative  of  the  sample  entropy,  whereas  in  general  H  is  the 
sample  cross-entropy.  H  and  f^  may  be  defined  as  functions  of  the  a^j  and 
(j)fn  by  applying  inverse  Fourier  transformations  to  Eq.  (10). 

For  m=l,2,...,(M/2)-l,  by  differentiation  of  Eq.  (11)  it  can  be 
shown  that 

[3H/31n(a^),  3H/3(|)J  =  2M"VJw)  (12) 

M-1 

where  w^^  =  }  fjln(f . 

j=0 

and  where  the  quantities  inside  the  square  brackets  represent  the  real  and 
imaginary  parts  of  a  complex  quantity,  respectively. 

For  •n=M/2,  the  factor  2  is  replaced  by  unity.  The  derivative  for  m 
equal  to  zero  is  not  required  since  the  mean  value  of  the  fj  is  fixed. 
Furthermore,  from  the  convolution  property  of  the  transform,  it  follows  that 

F^(w)  =  F*(f)F^(ln(f/b))  ,  (13) 

where  the  asterisk  denotes  complex  conjugation. 

Generally,  3H/3((i|^  is  non-zero  since  the  phase  of  Fni(f)  will  not 
match  exactly  that  of  Fm(lnf/b),  even  if  b  is  uniform. 

The  derivatives  with  respect  to  the  Fourier  coefficients  are  given 
by 


[3H/3Ujj,,  3H/3vJ  =  2M"V^(ln(f/b)),  (lA) 

for  m=l,2, . . ,(M/2)-l.  The  derivatives  become  zero  when  the  transform  coef¬ 
ficient  for  ln(f/b)  becomes  zero,  whereas  the  derivatives  with  respect  to 
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magnitude  ana  phase  are  zero  also  when  Ffj,(f)  is  zero,  independent  of  the 
prior  knowledge.  For  general  i  entropy  the  sola*’/  '  sought  are  those  f-.. 
which  the  variations  in  f  match  those  of  b. 


By  definition  Wq  is  the  cross-entropy  with  prior  b.  Otherwise  W|^ 
may  be  expressed  in  the  form 

“k  =  T 

j=0 

The  first  term  represents  the  cross-entropy  for  a  shifted  prior.  The  second 
term  is  a  sample  ** auto-cross-entropy '*  in  which  the  shifted  prior  is  replaced 
by  the  actual  value  measured. 


VI.  ENTROPY  FOR  SIGNAL  AND  NOISE  PERTURBATIONS 


Suppose  the  data,  fj,  lie  between  zero  and  unity  and  may  be  repre¬ 
sented  as  a  mean  value,  u,  minus  small  variations,  zj,  equal  to  the  sum  of  a 
signal  sj  and  zero-mean  noise,  nj,  independent  of  the  signal.  At  the  j-th 
location,  to  order  6*, 


InCfj/bj)  =  ln[(y  -  zj)Aij] 

=  ln(y)  -  26j  -  26 -  In(bj) 


(16) 


where  6 


J 


(si  +  nO  /2y  . 


It  can  be  shown  that,  for  m=l,2, ..{M/2)-l, 


<  F*CfjF^(ln[f/b)J  >  =  Ay  <|F^(6)P>  +  2y  <  F*(6)  >  FjlnCb))  ,  (17) 


where  <  >  denotes  expected  value. 


The  second  term  generally  has  an  imaginary  part,  which  introduces  a  phase 
dependence  to  the  generalized  entropy.  However,  if  the  bj  are  uniform,  then 


<  F^(f)F  (Inf)  >  =  y  '‘^[iS^!^  +  |N^MJ  ■  <  a„^  >  (18) 

mm  m  m  m 
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where  and  represent  F,^(s)  and  respectively. 

To  achieve  maximum  entropy  the  amplitude  at  each  frequency  is 
reduced,  on  average,  in  proportion  to  the  power  at  that  frequency,  which  is 
the  total  of  signal  plus  noise  power.  This  is  consistent  with  Frieden’s 
analysis  outlined  above  since  ln(l+<  a,^^  >)  is  approximately  <  a^^  >  for 
small  perturbations. 

If  the  b- function  represents  the  signal  component  correctly  to 
within  an  amplitude  factor  e,  bj  equals  u-esj,  and 

<  F*(f)  F^(ln(f/b))  >  =  4p  <  1FJ6)|2  >  -  2y  <  F*(6)  >  Fjn(y/b)  (19) 

=  y‘H(l  -  e)  IS^!^  +  IN^I^] 

Phase  dependence  is  absent  since  the  phase  variations  due  to  the 
signal  are  matched  to  those  in  the  prior  knowledge.  If  the  amplitude  is 
correctly  estimated,  e  equals  unity  and  the  cross-entropy  derivatives  on 
average  depend  on  the  noise  power  only,  so  minimization  of  cross-entropy 
entails  a  reduction  of  the  noise  component  while  maintaining  the  signal  com¬ 
ponent.  If  the  amplitude  is  incorrect,  then  some  loss  of  signal  power  is 

expected,  the  amount  depending  on  the  degree  of  mismatch,  so  the  minimiza¬ 
tion  of  cross-entropy  is  equivalent  to  a  reduction  of  power  representing  the 
mismatch  between  the  prior  function  and  the  data. 

VII.  ENTROPY  DERIVATIVES  FOR  CORRELATED  NOISE  MODELS 

In  this  section  it  is  shown  that  for  correlated  noise,  entropy  has 
a  phase  dependency.  Let  Uj  represent  variables  which  are  independent  and 
uniformly  distributed  over  (0,1).  From  these  a  family  of  noise  models  can 
be  derived  using  the  relationship 

Vj  =  oUj  +  BUj.i 

where  a  is  a  parameter  between  0  and  i  and  3  represents  1-a, 


r 
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The  mean  and  variance  of  Vj  are  j  and  (a^  +  respectively. 
The  correlation  at  lag  1  is  a&Aa^  +  8^)  and  zero  for  greater  lags.  Through 
straightforward  integration,  analytic  expressions  can  be  obtained  for 
<  VjlnVj+j^  >,  denoted  by  qj^Ca),  in  order  to  calculate  <  >.  In  par¬ 
ticular, 


q^Ca)  =  -5/12  -  (6^a~^ln8  -  a^8"^lna)/6  , 


qj^Ca)  =  -3/4  +  a/8  +  alnaC6a-3-a^)/128^  +  81n8Ca+3)/12a  , 

1^*^^  ^  1^^^  “  -7/12  +  (8+aj/6a  +  alnaCa-4)/128  +  81n8C2-4a-a^)/12a^ 


and  q(a)  „  q^^Ca)  =  qfg_;^(a) 


=  -3/4  -  alna/48  -  81n8/4a 
for  k  not  equal  to  0  or  1. 


Furthermore,  for  m=l,2, . . . ,(M/23-l, 


<  F^(w(a))  >  =  MQ^(a3  +  (M-l3Qj^Ca3  cos2Trm/M 


+  i(M-l3  Q2Ca3sin2Trm/M 


where  Q^Ca3  =  qj^(a3  -  q(a3  , 


Q^Ca3  =  q^Caj  +  q|^_^(a3  -  2q(a3 


Qq,  Q|  and  Q2  are  positive  for  0  <  a  <  1,  and  Q2  is  small  compared 
to  the  other  two.  Also  Q2(63=  -Q2(a3,  whereas  the  other  two  functions  are 
symmetric.  The  imaginary  part  depends  upon  <  xjlnxj+j^  >  -  <  xj+j^lnxj  >, 
which  reflects  asymmetry  with  regard  to  the  spatial  coordinate  direction. 


I 


i 
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A  related  ncdel  for  negatively  correlated  noise  is  given  by 


Vj  =  -oUj  +  +  a,  (22) 

for  which  the  correlation  function  of  Vj  at  lag  1  is  -o6/(a^  +  8^).  It  can 
be  shown  that 

<  Fjj^(w(-a))  >  =  MQ^Ca)  -  (M-l)Qj^(a)  cos2Trni/M  -i(M-l)Q2  sin2TTm/M  (23) 

for  0  <  a  <  1  . 

As  a  function  of  frequency  the  real  part  of  <  F|n(w(a))  >  decreases 
with  tn  whereas  that  of  <  Ffn(w(-a))  >  increases.  From  the  relationship  given 
in  Eq.  (12),  it  can  be  seen  that  to  increase  the  entropy,  larger  decreases 
in  amplitude  are  made  at  low  frequencies  than  at  high  frequencies  when  the 
correlation  is  positive.  The  action  is  that  of  a  high-pass  filter.  For 
negative  correlation,  the  opposite  applies,  so  the  action  is  that  of  a  low- 
pass  filter.  For  uncorrelated  noise,  the  derivatives  are  independent  of 
frequency  which  indicates  a  uniform  decrease  in  power  is  required  over  the 
flat  spectrum. 

In  general  the  relationship  between  the  auto-correlation  function 
and  the  auto-cross-entropy  function  is  reflected  in  a  relationship  between 
the  power  spectrum,  and  the  entropy  derivatives.  If  q|<  equals  <  fjlnfj+j^  >, 
then 


<  Fj^(w)  >  =  Mq^  +  Mq^/2  ^oSTim 


M/2-1 

+  2  UM-k)(qj^+qj^)  +  k(q|^|^+q|^_l^)J  cos2Timl</Vl 

k-1 


M/2-1 

+  i  2  UM-k)(q^-q 


k  ^-k 


k(q,^k"‘^k-M^^  sin2irmkAl. 


(24) 


If  Yk  represents  <  Cfj-M)(fj+k'U)  >»  where  y  represents  the  mean  of 

M- 1 

f,  then  the  Fourier  transform  of  ^  <  fj-y)Cfj+k-U-i  >  yj.eiCo  the  pawr?r 

‘<=°  M-1 

spectrum  in  the  form  <  |F^(f3l^  >  =  MYq+  ^  Y[^cos2Tim/M  (25) 

k=0 

This  result  may  be  obtained  by  replacing  the  q’s  in  Eq.  (24)  by  the 
corresponding  y* s  and  reducing  the  result  through  application  of  the  con¬ 
ditions  of  symmetry;  Yk=YN-k='V'-k*  ^  significant  correlation  at  lag  k,  say, 
will  be  reflected  in  a  relatively  large  value  for  the  auto-cross-entropy, 
but  the  relationship  is  not  simply  stated  because  of  the  non-linearity 
introduced  by  the  In-function. 

VIII.  ILLUSTRATIVE  EXAMPLES 

In  this  section  results  are  presented  to  illustrate  the  charac¬ 
teristics  of  signal  reconstructions  under  various  conditions  of  prior 
knowledge  and  constraints.  The  original  data,  the  signal  to  be  recovered, 
and  various  solutions  are  shown  in  Fig.  1.  Because  of  its  large  steps,  the 
function  is  not  particularly  well-suited  to  Fourier  analysis,  on  the  other 
hand,  it  is  largely  of  low  frequency  content  and  is  symmetric.  The  noise  is 
additive  Gaussian  noise  of  zero  mean  and  standard  deviation  equal  to  the 
size  of  the  steps  in  the  signal. 

The  signal  is  represented  by  one  data  set  of  64  equispaced  samples. 
Noise-only  data  were  available  for  56  sets  of  this  size  from  which  sta¬ 
tistics  of  Fourier  phase  and  magnitude  were  obtained  for  each  frequency. 
This  process  was  included  since  it  simulated  real  situations  in  which  noise 
samples  are  plentiful.  It  is  assumed  that  the  mean  value  of  the  signal  is 
known. 

For  the  maximum  entropy  solution,  it  is  assumed  that  the  signal  is 
a  constant,  a  condition  of  minimum  prior  knowledge,  apart  from  the 
constraint  that  the  signal  be  positive.  For  the  minimum  cross-entropy  solu¬ 
tions,  it  is  assumed  that  the  signal  is  known  exactly,  so  it  is  of  the  given 
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Figure  1.  Plots  of  the  original  data  and  of  signal  representations  obtained 
under  various  constraint  and  prior  knowledge  conditions.  The 
true  signal  is  shown  in  each  plot. 


form  unless  otherwise  indicated  by  the  data.  The  aim  of  the  recovery  pro¬ 
cess  in  these  cases  is  not  so  much  to  obtain  the  assumed  correct  result  as 
it  is  to  evaluate  differences  between  the  expected  signal  and  that  produced 
from  a  particular  data  set. 

The  upper  left  quadrant  of  Fig.  1  shows  the  signal  and  the  signal- 
plus-noi.j  data  which  served  as  the  signal  representation.  The  upper  right 
quadrant  shows  a  maximum  entropy  solution  with  “loose"  constraints,  that  is, 
constraints  in  the  Fourier  domain  vrfiich  were  chosen  to  represent  several 
standard  deviations  in  the  estimated  noise  power  and  phase.  (A  more  exact 
definition  is  given  below).  The  solution  consists  largely  of  a  single  low- 
frequency  component.  Sharp  edges  are  not  indicated  in  this  "maximally 
smooth"  solution. 

The  lower  right  quadrant  of  Fig.  1  shows  the  minimum  cross-entropy 
solution  for  the  same  constraint  bounds  but  with  exact  prior  knowledge.  The 
peaks  and  edges  of  the  reconstruction  match  those  of  the  signal.  The  stan¬ 
dard  deviation  of  the  error  is  0.017  and  the  maximum  absolute  error  is  0.051 
whereas  for  the  original  data  these  values  were  0.114  and  0.319.  The  noise 
power  has  been  reduced  by  a  factor  of  50. 

The  general  appearance  of  this  reconstruction  is  similar  to  that 
obtained  from  the  Wiener  filter  Cnot  shown)  with  exact  knowledge  of  signal 
and  noise  spectra,  except  that  the  Wiener  filter  solution  is  slightly 
displaced  to  the  right  and  lacks  edge  definition.  For  the  Wiener  filter 
solution  the  maximum  absolute  error  is  0.058  and  the  standard  deviation  of 
the  error  is  0.031  which  represents  a  reduction  in  noise  power  by  a  factor 
of  14.  In  this  case  the  Wiener  filter  solution  is  a  good  initial  estimate 
to  the  reconstruction  for  loose  constraints.  The  main  improvement  accrues 
from  phase  shifts. 
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The  lower  left  quadrant  of  Fig.  1  shows  the  effects  of  tightening 
the  constraints  in  the  Fourier  domain  so  that  more  credence  is  afforded  the 
data  relative  to  the  prior  knowledge.  In  this  case  a  high  frequency  com¬ 
ponent  is  evident.  The  standard  deviation  of  the  error  is  0.059,  which 
represents  a  noise  power  reduction  factor  of  3.7. 

Generally,  Fourier  components  are  reduced  by  the  minimization  pro¬ 
cess  to  their  lower  bounds  provided  they  are  not  otherwise  constrained  by 
the  prior  knowledge  or  restrictions  in  the  spatial  domain.  For  the  “loose** 
constraints,  the  lower  bound  of  the  magnitude  was  zero  unless  the  Fourier 
magnitude  squared  was  7  times  the  noise  power  standard  deviation.  Since  the 
noise  power  at  a  given  frequency  is  distributed  as  a  chi-squared  variable 
with  mean  equal  to  a,  the  probability  of  a  non-zero  lower  bound  was  roughly 
1  out  of  1000.  For  the  “tight**  constraints  this  probability  was  roughly  1 
out  of  50.  The  sample  noise  power  for  the  high  frequency  which  is  evident 
in  the  solution  for  tight  constraints  was  6.8  times  the  noise  mean.  A  peak 
of  this  magnitude  is  expected  to  occur  roughly  once  in  1000  cases.  At  this 
frequency  the  tight  constraints  imposed  a  lower  bound  well  above  zero  so 
that  this  component  appears  as  a  part  of  the  solution.  For  loose 
constraints,  the  lower  bound  was  zero  and  the  contributions  from  this  fre¬ 
quency  were  much  reduced. 

It  may  be  thought  that  the  data  incorrectly  indicate  such  a  com¬ 
ponent  as  being  part  of  the  signal,  but  in  practice  a  better  view  to  take  in 
this  regard  is  that  such  a  component  is  present  at  a  certain  level  of  con¬ 


fidence.  If  the  analyst  were  to  have  prior  knowledge  with  regard  to  the 
frequency  composition  of  the  signal,  this  knowledge  should  be  incorporated 
into  the  constraints.  If  the  signal  was  thought  to  have  no  isolated  narrow- 


band  components,  then  the  bounds  on  the  Fourier  magnitudes  could  incorporate 


several  adjacent  frequencies,  perhaps  through  an  averaging  procedure,  to 
reduce  the  standard  deviation  of  the  noise  power  estimates.  On  the  other 
hand,  if  narrow-band  components  are  not  only  possible  but  also  of  interest, 
then  it  would  be  incorrect  to  treat  the  components  in  groups  in  this  manner. 
Flexibility  in  the  imposition  of  constraints  is  a  major  benefit  of  treating 
data  in  the  method  proposed. 

IX.  CONCLUSIONS 

It  has  been  shown  that  a  general  purpose  minimization  method  based 
on  quasi-Newton  search  techniques  can  be  applied  to  the  problem  of  recovery 
of  signals  from  noisy  data  when  noise-only  data  are  available  for  estimation 
of  related  statistics.  A  finite  Fourier  transformation  is  applied  to  the 
data  and  constraints  applied  to  the  Fourier  coefficients  consistent  with  the 
noise  statistics.  By  requiring  that  the  data  be  positive,  the  target  func¬ 
tion  can  be  chosen  to  be  the  generalized  entropy.  The  method  then  yields 
smooth  solutions  in  the  absense  of  prior  knowledge  (the  maximum  entropy 
solution}  or  solutions  which  tend  to  conform  smoothly  to  the  prior-knowledge 
solution  (the  minimum  cross-entropy  solution). 

A  major  advantage  of  the  method  is  the  flexibility  with  which 
constraints  may  be  imposed  in  either  the  frequency  domain  or  the  temporal 
(or  spatial)  domain.  This  allows  for  direct  application  of  prior  infor¬ 
mation  to  the  process  of  solution.  A  second  major  advantage  is  the  adap¬ 
tability  of  the  method  with  regard  to  the  degree  of  credibility  assigned  to 
the  signal  data.  Tight  bounds  on  the  constraints  yield  solutions  which  con¬ 
form  closely  to  the  data.  Loose  bounds  yields  solutions  which  resemble  clo¬ 
sely  the  expected  results.  Thus  the  analyst  has  a  full  range  of  solutions 
from  which  to  choose  that  which  is  most  appropriate  for  the  particular 
problem. 
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The  selection  of  the  appropriate  constraint  levels  may  be  con¬ 
sidered  as  a  problem  in  statistical  analysis  and  levels  of  confidence  may  be 
applied  to  solutions.  This  was  demonstrated  by  an  illustrative  example. 


The  minimization  process  requires  the  derivatives  of  the  genera¬ 
lized  entropy  with  respect  to  the  Fourier  coefficients.  These  were  obtained 
analytically  and  results  were  studied  for  two  cases  of  interest.  It  was 
shown  that  phase  is  important  not  only  for  cross-entropy  solutions  but  also 
for  maximum  entropy  solutions  when  the  noise  is  correlated.  The  analytical 
results  indicate  that  the  minimization  process  tends  to  produce  solutions 
satisfying  lower  bounds  for  the  power  spectral  coefficients.  Thus  it  is 
expected  that  solutions  will  resemble  those  obtained  by  the  simple  noise 
subtraction  method  given  by  Eq.  (8)  unless  otherwise  constrained. 


The  examples  given  pertained  to  one-dimensional  variables  but  the 
results  can  be  extended  to  images  [l] .  A  major  disadvantage  of  the  method 
is  the  computational  time  required.  For  this  reason  studies  are  proposed  on 
special  purpose  minimization  processes  applicable  to  a  more  restrictive 
class  of  signal,  for  example,  severely  band-limited  signals. 
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